The access to activity of subcortical structures offers unique opportunity for building intention dependent brain-computer interfaces, renders abundant options for exploring a broad range of cognitive phenomena in the realm of affective neuroscience including complex decision making processes and the eternal free-will dilemma and facilitates diagnostics of a range of neurological deceases. So far this was possible only using bulky, expensive and immobile fMRI equipment. Here we present an interpretable domain grounded solution to recover the activity of several subcortical regions from the multichannel EEG data and demonstrate up to 60% correlation between the actual subcortical blood oxygenation level dependent sBOLD signal and its EEG-derived twin. Then, using the novel and theoretically justified weight interpretation methodology we recover individual spatial and time-frequency patterns of scalp EEG predictive of the hemodynamic signal in the subcortical nuclei. The described results not only pave the road towards wearable subcortical activity scanners but also showcase an automatic knowledge discovery process facilitated by deep learning technology in combination with an interpretable domain constrained architecture and the appropriate downstream task.
translated by 谷歌翻译
In this paper, we perform an exhaustive evaluation of different representations to address the intent classification problem in a Spoken Language Understanding (SLU) setup. We benchmark three types of systems to perform the SLU intent detection task: 1) text-based, 2) lattice-based, and a novel 3) multimodal approach. Our work provides a comprehensive analysis of what could be the achievable performance of different state-of-the-art SLU systems under different circumstances, e.g., automatically- vs. manually-generated transcripts. We evaluate the systems on the publicly available SLURP spoken language resource corpus. Our results indicate that using richer forms of Automatic Speech Recognition (ASR) outputs allows SLU systems to improve in comparison to the 1-best setup (4% relative improvement). However, crossmodal approaches, i.e., learning from acoustic and text embeddings, obtains performance similar to the oracle setup, and a relative improvement of 18% over the 1-best configuration. Thus, crossmodal architectures represent a good alternative to overcome the limitations of working purely automatically generated textual data.
translated by 谷歌翻译
Current self-supervised learning algorithms are often modality-specific and require large amounts of computational resources. To address these issues, we increase the training efficiency of data2vec, a learning objective that generalizes across several modalities. We do not encode masked tokens, use a fast convolutional decoder and amortize the effort to build teacher representations. data2vec 2.0 benefits from the rich contextualized target representations introduced in data2vec which enable a fast self-supervised learner. Experiments on ImageNet-1K image classification show that data2vec 2.0 matches the accuracy of Masked Autoencoders in 16.4x lower pre-training time, on Librispeech speech recognition it performs as well as wav2vec 2.0 in 10.6x less time, and on GLUE natural language understanding it matches a retrained RoBERTa model in half the time. Trading some speed for accuracy results in ImageNet-1K top-1 accuracy of 86.8\% with a ViT-L model trained for 150 epochs.
translated by 谷歌翻译
对比方法导致了最近的自我监督表示学习(SSL)的表现激增。诸如BYOL或SIMSIAM之类的最新方法据称将这些对比方法提炼为它们的本质,消除了钟声和哨子,包括负面示例,这些示例不影响下游性能。这些“非对比度”方法的工作非常出色,而无需使用负面因素,即使全球最低限度的崩溃都在淡化。我们通过经验分析了这些非对抗性方法,发现Simsiam对数据集和模型大小非常敏感。特别是,如果模型相对于数据集大小而言太小,则SIMSIAM表示会经历部分维度崩溃。我们提出了一个度量标准来测量这种崩溃的程度,并表明它可以用于预测下游任务性能,而无需任何微调或标签。我们进一步分析建筑设计选择及其对下游性能的影响。最后,我们证明,转移到持续的学习设置充当正规化器并防止崩溃,并且在Imagenet上使用Resnet-18,连续和多上述训练之间的混合物可以提高线性探针精度多达18个百分点。
translated by 谷歌翻译
我们探索一种以数据为基础的学习方法来优化神经网络。我们构建神经网络检查点的数据集,并培训有关参数的生成模型。特别是,我们的模型是一个条件扩散变压器,鉴于初始输入参数向量以及提示的丢失,误差或返回,可以预测实现所需度量的参数更新的分布。在测试时,它可以在一个更新中优化具有看不见的参数的神经网络。我们发现我们的方法成功地生成了各种损失提示的参数。此外,它可以采样多模式参数解决方案,并具有有利的缩放属性。我们将方法应用于监督和强化学习中的不同神经网络体系结构和任务。
translated by 谷歌翻译
我们训练一个深度学习的人工神经网络模型,空间注意力U-NET,从通过Hualien的垂直发病率脉冲电离层雷达测量的噪声电离图数据中恢复有用的电离层信号。我们的结果表明,该模型可以很好地识别F2层普通和非凡的模式(F2O,F2X)以及E层的组合信号(普通和非凡模式以及零星的ES)。该模型还能够识别一些未标记的信号。模型的性能可以通过数据集中的样本数量不足来显着降低。从恢复的信号中,我们确定F2O和F2X的临界频率以及两个信号之间的相交频率。两个临界频率之间的差异为0.63 MHz,不确定性为0.18 MHz。
translated by 谷歌翻译
测试时间培训通过使用自学意义的每个测试输入优化模型,可以随时适应新的测试分布。在本文中,我们将蒙版的自动编码器用于这个单样本学习问题。从经验上讲,我们的简单方法改善了许多视觉基准的概括,以进行分配变化。从理论上讲,我们根据偏见变化权衡取得的改进来表征。
translated by 谷歌翻译
应在AI系统的自然语言输出中引入水印,以保持人类和机器生成的文本之间的区别。不模糊这种区别的道德势在必行是由大语言模型的态度性质以及人类对机器上的情感和认知状态的预测,可能导致操纵,传播虚假或情感困扰。执行这种区别需要机器原点的不感知,但易于访问的标记。我们建议根据等距字母序列实现代码。尽管人工写的文本中没有这种代码,但出于道德原因,它在机器生成的文本中的外观将有所帮助。
translated by 谷歌翻译
在这项工作中,我们研究了生成图像模型的性能和评估如何受到其培训数据集的种族组成的影响。通过检查和控制各种培训数据集中的种族分布,我们能够观察不同培训分布对生成的图像质量和生成图像的种族分布的影响。我们的结果表明,生成的图像的种族组成成功地保留了培训数据。但是,我们观察到截断是一种用于在推断过程中生成更高质量图像的技术,加剧了数据中的种族失衡。最后,在检查图像质量与种族之间的关系时,我们发现给定种族的最高可感知的视觉质量图像来自该种族代表性很好的分布,并且注释者始终偏爱白人的生成图像,而不是黑人。
translated by 谷歌翻译
一个人如何在没有特定任务的固定或任何模型修改的情况下将预训练的视觉模型调整为新颖的下游任务?受到NLP提示的启发,本文研究了视觉提示:在测试时间和新输入图像时,给定的输入输出图像示例示例,目标是自动生成输出图像,与给定的示例一致。我们表明,将这个问题作为简单的图像插入,实际上只是填充了串联的视觉提示图像中的一个孔 - 只要已经对正确的数据训练了介入算法,就非常有效。我们在我们策划的新数据集上训练蒙面的自动编码器-88K未标记的数字来自ARXIV上的学术报纸来源。我们将视觉提示应用于这些预处理的模型,并在各种下游图像到图像任务上展示结果,包括前景分割,单个对象检测,着色,边缘检测等。
translated by 谷歌翻译